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1      Introduction 

From  a  Bayesian  perspective,  uncertainty  in  a  random  quantity,  say  X 
is  modeled  in  terms  of  P(-),  a  (joint)  probability  distribution  of  (all  the 
components  of)  X.  To  define  such  a  distribution,  parametric  models  are 
used  as  a  convenient  way  to  specify  such  a  joint  distribution;  see,  e.g., 
Lindley  (1971),  Box  and  Tiao  (1973),  or  Berger  (1985).  For  example,  let 
6  denote  an  unknown  parameter  taking  on  a  value  in  0,  the  parameter 
space,  and  for  each  ^6  0,  let  the  conditional  distribution  of  X  given  6  be 
specified  by  P[-  \  6).  These  distributions  can  be  given  in  terms  of  densities, 
which,  SiS  a  function  of  6  for  fixed  X,  is  known  as  the  likelihood  function. 
The  unconditional  distribution  of  X  is  then  given  by  specifying  7r(-),  a 
probability  distribution  on  0. 

While  apparently  easier  than  just  specifying  the  unconditional  distribu- 
tion of  X  directly,  there  are  two  problems  with  this  approach.  First,  how 
should  one  choose  the  likelihood  function?  Common  practice  leads  to  those 
likelihood  functions  which  have  been  traditional  choices  in  the  past  or  which 
satisfy  exogenous  needs  for  mathematical  tractability  or  simplicity.  This 
approach  is  natural  for  investigators  comfortable  with  frequentist  prob- 
ability models  who  assume  that  there  is  some  fixed,  unknown  parameter 


which  fully  characterizes  the  distribution  of  the  observed  random  variables. 
However,  this  raises  a  second  problem  -  the  specification  of  the  prior  dis- 
tribution. Bayesian  approaches  are  often  abandoned  in  specific  problems 
because  specifying  the  prior  is  so  daunting.  Much  of  the  difficulty  is  due  to 
the  abstract  nature  of  the  imknown  parameter. 

In  this  paper,  we  address  the  problem  of  defining  parameters  amd  spec- 
ifying parametric  probability  models.  Section  2  provides  a  formal  develop- 
ment of  Bayesian  parametric  models.  To  any  class  P  of  probability  models 
for  specific  data,  X,  a  parameter  is  defined  as  any  measurable  function  of 
the  data  which,  as  a  conditioning  variable,  leads  to  consensus  beliefs  for  the 
randomness  in  the  data.  That,  is  the  conditional  distribution  of  X  given 
the  parameter  does  not  vary  with  models  P  &  P .  The  prior  distribution  on 
the  parameter  then  has  a  simple  definition  as  the  marginal  distribution  on 
collections  of  outcomes  of  X  defined  by  values  of  the  parameter  -  the  mea- 
surable fvmction  of  X.  Similarly,  the  likelihood  function  for  the  parameter 
has  a  natural  definition  in  terms  of  conditional  expectations  of  (measur- 
able) functions  of  the  data  given  the  cr— field  on  the  sample  space  of  the 
data  generated  by  the  parameter.  With  definitions  of  a  parameter,  its  prior 
distribution  and  likelihood  fimction,  we  are  able  to  construct  the  "Bayesian 
parametric  representation"  of  any  Bayesian  probability  model.  This  is  akin 
to  a  generalization  of  DeFiimeti's  representation,  when  modeling  data  other 
than  as  an  infinite  sequence  of  exchangeable  random  variables. 

Section  2  also  proposes  two  useful  concepts  for  Bayesian  modeling.  The 
first,  "minimal  parameters,"  characterize  parametrizations  of  a  class  of 
probability  models  which  are  most  parsimonious  in  the  sense  that  no  un- 
observable  quantities  are  needed  to  specify  the  models  and,  the  extent  of 
prior  introspection  necessary  to  identify  specific  models  is  minimized.  The 
second  concept,  a  "maiximal  class"  comprises  the  totality  of  Bayesian  prob- 
ability models  consistent  with  a  given  parametrization.  The  robustness  of  a 
cIelss  of  Bayesian  parametric  probability  models  is  determined  in  part  by  the 
range  of  subjective  beliefs  which  are  consistent  with  the  parametrization. 

Section  3  demonstrates  the  practical  scope  of  the  theory  in  the  context 
of  several  applications  to  finite  sequences  of  random  variables. 

The  Bayesian  perspective  does  provide  some  insight  on  the  existence  of 
parametric  models  when  the  random  quamtity  is  a  sequence  of  exchange- 
able random  variables,  however.  DeFinetti's  (1937)  representation  theorem 


(and  its  various  generalizations,  e.g.,  Hewitt  and  Savage,  1955)  says  that  if 
the  sequence  of  exchangeable  rajidom  variables  is  potentially  infinite  in  size, 
then  there  exists  a  parametric  model  for  such  a  sequence  with  independent 
and  identically  distributed  components.  Unfortunately,  the  theory  does 
not  address  the  issue  of  how  such  models  and  their  underlying  parameters 
should  be  defined.  Moreover,  the  size  of  the  sequence  must  be  potentially 
infinite  in  order  for  the  parametric  probability  model  to  be  well  defined.  In 
contrast,  our  development  leads  to  definitions  of  parameters  and  paramet- 
ric models  for  any  random  variable  or  sequence  of  random  variables.  The 
notion  of  treating  functions  of  the  data,  i.e.,  statistics,  as  parameters  is  not 
new.  Lauritzen  (1974)  discusses  the  problem  of  predicting  the  value  of  a 
minimal  totally  sufficient  statistic  for  a  sequence  ("population")  given  the 
observation  of  a  subsequence  ("sample").  Then,  the  value  of  the  "sufficient 
statistic  in  the  larger  part  of  the  population  will  play  the  role  of  an  un- 
known 'parameter'.  This  'parameter'  is  then  estimated  by  the  method  of 
maximum  likelihood  (pp. 129-130)."  He  notes,  however,  that  this  likelihood 
approach  to  inferences  may  "give  unreasonable  results"  (p.  130)  because 
the  maximum  likelihood  preditions  will  not  vary  with  changes  in  the  family 
of  distributions  P ,  so  long  as  the  minimial  totally  sufficient  statistic  and 
the  conditional  distributions  given  these  remain  the  same. 

Lauritzen  (1974)  also  introduces  the  notion  of  "extreme  models"  (p. 
132).  These  models  correspond  to  our  Bayesian  models  in  which  the  prior 
distributions  are  degenerate  at  particular  parameter  values.  In  these  cases, 
he  notes  that  "the  'parameter'  is  in  a  certain  sense  equal  to  the  value  of 
the  statistic  when  calculated  over  the  entire  population"  (p.  132).  His 
likelihood  approach  leads  to  the  conclusion  that  such  a  class  P  "is  as  broad 
as  one  would  want"  since  it  consists  of  all  models  which  might  be  true  in 
any  sense. 

This  interpretation  of  the  adequacy  of  models  highlights  the  distinction 
of  the  Bayesian  approach,  however.  Without  knowledge  of  which  model  is 
true,  the  Bayesian  solution  reflects  this  uncertainty  in  terms  of  a  probabil- 
ity distribution  over  possible  models.  The  problems  Lauritzen  notes  with 
maximum-likelihood  preditions  being  unaffected  by  changes  in  the  class  P , 
are  resolved  coherently  with  Bayesian  models,  where  changes  in  P  lead  to 
changes  in  the  definition  of  the  parameter  and/or  the  specification  of  prior 
distributions  on  the  parameter. 


Our  development  of  Bayesian  parametric  models  is  closely  related  to 
Dawid's  (1979,1982)  "inter-subjective  models."*  When  a  sufficient  statistic 
T  can  be  found  for  a  class  P  of  probability  models,  then  Dawid  calls  T  an 
"/—parameter"  for  the  family  of  conditional  distributions,  given  T.  Dawid 
notes  that  "it  will  often  be  appropriate  to  embed  the  class  of  events  un- 
der consideration  in  a  larger,  real  or  conceptual,  class  before  constructing 
the  /—model  (p. 221),  and  that  "there  may  be  many  different  but  equally 
satisfactory  models"  (p. 219).  Arguing  subjectively  for  the  applicability  of 
propensity-type  models,  he  details  specific  /—models  for  exchangeable  bi- 
nary sequences  and  orthogonal  invariant  measures,  and  discusses  the  rele- 
vant theory  of  I  — models  for  sequences  with  repetitive  structure,  extending 
the  theory  of  Lauritzen  (1974). 

The  major  distinction  between  our  approach  and  Dawid's  is  that  we 
focus  on  the  specification  of  models  and  the  definition  of  parameters  for 
finite  sequences  of  observable  random  quantities.  We  develop  models  for 
a  sequence  of  infinite  length  as  the  limiting  case  of  a  sequence  of  models 
for  finite  sequences.  Parameters  defined  as  measurable  functions  of  the  ob- 
servables  are  then  seen  be  interpretatable  for  infinite  sequences  only  if  the 
limit  of  the  sequence  of  measurable  functions  exists  in  any  meaningful  way. 
When  the  infinite-length  sequence  is  taken  as  primary,  there  may  be  consid- 
erable flexibility  in  defining  parameters  in  terms  of  unobservable  variables. 
Rather  than  view  alternative  parametrizations  as  equally  satisfactory,  we 
advocate  applying  the  concept  of  minimal  parametrizations  as  a  guiding 
principle  with  which  to  define  model  parameters,  parsimoniously,  in  terms 
of  observables. 


2      Bayesian  Parametric  Models 

Let  X  be  a  random,  observable  quantity.^   Denote  the  same  sample  space 
of  Xby  X. 


^We  thank  Richard  Barlow  for  bringing  Dawid's  work  to  our  attention. 

^Often  X  is  a  sequence  of  random  variables,  i.e.,  X  =  [ii, ...,  ijv).  For  example,  X 
could  represent  the  outcome  of  a  sequence  of  N  coin  tosses  or  the  states  of  a  discrete-time 
dynamic  system  as  it  evolves  from  time  1  to  time  A'^.  This  interpretation  of  A"  as  a  finite 
sequence  of  random  variables  is  implicit  in  this  section  and  will  be  explicit  in  section  3. 


From  a  Bayesian  perspective,  uncertainty  in  the  random  quantity  X 
is  modeled  in  terms  of  P{-),  a  (joint)  probability  distribution  of  (all  the 
components  of)  X.  To  define  such  a  distribution,  let  7  denote  a  <7-field  on 
X  for  which  X  \s  measurable.  The  class  I  is  the  set  of  all  events  pertaining 
to  X  for  which  the  Bayesian  can  specify  probabilities  through  introspection 
(and  deduction  applying  the  probability  calculus). 

Let  P  =  {-P's}  be  a  class  of  probability  models  on  the  measureable 
space  {X ,  J)  for  a  group  of  Bayesians.  That  is,  P  constitutes  the  Bayesians' 
spectrima  of  possible  choices  for  a  distribution  of  X.  The  size  of  such  a  class 
will  depend  on  the  degree  of  possible  heterogeneity  in  the  Bayesians'  (prior) 
beliefs  for  X.  While  P  could  be  so  general  as  to  encompass  all  distributions 
on  (JT,  J),  a  non-trivial  special  case  is  the  class  of  all  exchangeable  distri- 
butions. 

We  shall  define  a  parameter  6  for  a  given  class  P  of  Bayesian  models  as 
a  measurable  function:  6  :  Z  — >  0  such  that  the  conditional  distribution 
of  X  given  0  is  the  same  for  all  models  P  G  P.  Such  a  definition  makes 
the  parameter  an  operationally  defined  random  variable,  that  is,  defined  in 
terms  of  observables  alone. 

A  parameter  can  be  found  by  first  introducing  a  subfield  Ts  of  the  basic 
field  7  such  that  the  conditional  expectation,  relative  to  7g,  of  the  indicator 
of  any  event  in  J  is  the  same  for  all  P  €  P.  Any  function  on  X  that  induces 
such  a  subfield  can  then  be  used  as  a  parameter.  Formally,  with  1a  denoting 
the  indicator  of  an  event  A  and  Ep  denoting  the  expectation  with  respect 
to  the  distribution  P,  we  propose 


Definition:  An  J-measurable  function  ^  is  a  parameter  for  the 
class  P ,  if,  for  any  A  €i  T,  the  conditional  expectation  Ep{1a  \ 
Je)  can  be  defined  to  be  equal  for  all  P  6  P. 

We  shall  refer  to  the  field  7e  as  the  parameter  field  for  the  class  P .  Any 
subfield  of  7  that  differs  from  7$  only  by  sets  of  measure  zero  can  serve 
equally  well  as  a  parameter  field  for  the  class.  In  practice,  any  one  of  these 
fields  can  be  used.  Moreover,  these  are  typically  many  functions  on  X  that 
induce  the  same  subfield.  Therefore,  there  will  be  many  functions  6  that 
may  serve  as  a  parameter. 


The  definition  of  the  peirameter  provides  an  J^-measurable  function 
E{1a  I  •^)»  for  any  A  E  7,  which  does  not  depend  on  the  particular  P  in 
P,  that  is,  an  /^-measurable  function  which  is  the  same  for  every  Bayesian 
in  the  group.  This  function  can  be  used  to  provide  a  generalized  definition 
of  the  likelihood  for  a  class  of  models. 


Definition:  The  parametric  likelihood  function  for  the  class 
P  with  respect  to  a  parameter  0  is  a  function  !'(•,•)  :  J  x 
©    — >    [0,  l]  whose  values  are  given  by 

L{A,e')  =  E{u  I  n), 

for  almost  all  x  :   ^(i)  =  ^. 

This  definition  is  consistent  with  convential  usage  when  A  consists  of  a 
single  component  of  X. 

Given  a  parameter  6  for  the  class  P,  any  distribution  P  E  P  specifies 
a  distribution  on  {X.,7e),  which  we  call  the  prior  distribution  of  0  corre- 
sponding to  P.  Identifying  the  prior  distribution  with  a  distribution  on 
observables  contrasts  with  its  usual  interpretation  as  a  distribution  on  an 
abstract  parameter  space  with  <7-field  §.  However,  the  latter  can  be  for- 
malized with  the  present  definition  of  d  as  a  measurable  function  of  X, 
with 

g  =  {e{A),  A  e  Te}. 

Consider  then 


Definition:  The  prior  distribution  of  a  peirameter  0  for  a  model 
P  e  P  is  the  measure  tt  on  {0,  g)  induced  by  P  on  (Z,  Jj),  that 
is, 

7r{B)  =  P{0-'{B)),     Beg. 

These  definitions  allow  us  to  write  a  parametric  representation  for  any 
Bayesian  model,  generalizing  DeFinetti's  representation  for  exchangeable 
sequences  to  arbitrary  random  variables.  The  identity  for  conditional  ex- 
pectations 

P{A)  =  E{E{U  I  7e)], 


for  any  >l  G  J,  when  expressed  in  terms  of  the  parametric  likelihood  and 
prior  distribution  provides 

Definition:      For  any  parameter  5  of  a  class  P  of  models  on 

{X ,T),  the  parametric  representation   of  the  probability  of  an 

event  A  G  I  is 

P{A)  =  I   L{A,e)7r{de). 
Je 

The  likelihoods  can,  therefore,  be  interpreted  as  the  extreme  probabil- 
ity eissignments  pertaining  to  the  class  P  in  the  sense  that  any  probability 
assignment  in  the  cIeiss  can  be  expressed  as  a  convex  mixture  of  the  likeli- 
hoods. 

A  parameter  always  exist  for  any  class  P.  The  identity  function  0{X)  = 
X  is  such  that  the  conditional  expectation  of  the  indicator  of  any  A  E  7 
with  respect  to  Tg  is  the  indicator  itself,  since 

E[1a  I  Te]    =   E[1a  I  T]    =   1a. 

It  follows  that  this  conditional  expectation  is  equal  for  all  P  no  matter 
what  P  is  and,  hence,  that  the  the  identity  is  a  parameter  for  amy  class  P. 
A  class  P  may  admit  multiple  parameters.  For  any  two  such  parameters, 
say  ^1,  ^2  5  the  intersection  Te  =  ^Si^^e^  defines  a  paxametrization  which  is  no 
finer  than  those  given  by  0i  and  62.  The  most  parsimonious  parametrization 
is  obtained  by  taJcing  the  intersection  over  all  parameter  fields.  This  leads 
to: 


Definition:    A  parameter  0  is  minimal  for  a  class  P  \f  7$  =  fl  ^«, 
where  the  intersection  is  taken  over  all  parameters  of  P. 

By  defining  parameters  as  measurable  functions  of  the  random  variable, 
the  correspondence  between  Bayesian  models  P  G  P  and  prior  measures  tt 
on  0  is  one-to-one.  The  relationship  need  not  be  "onto,"  however.  Some 
distributions  tt  on  0  may  specify  distributions  which  are  not  in  P.  This 
will  occur,  for  example,  when  the  class  P  is  not  convex  under  probabilistic 
mixing. 


Definition:  A  class  P  of  Bayesian  models  is  maximal  with 
respect  to  a  parameter  0  if  11,  the  class  of  all  distributions  tt  on 
{0,^),  is  isomorphic  to  P. 

A  class  P  is  maximal  for  a  parameter  6  only  if  ^  is  a  minimal  parameter. 
Such  classes  reflect  the  entire  range  of  Bayesian  beliefs  which  are  consistent 
with  the  given  minimal  parametrization. 

REMARKS: 

(2.1)  Our  definition  of  a  parameter  is  similar  to  the  frequentist  definition 
of  a  sufficient  statistic  when  the  class  P  is  indexed  by  a  frequentist 
parameter  </>£$,  the  frequentist  parameter  space.  However,  the  pa- 
rameters in  frequentist  models  are  generally  not  measurable  functions 
of  the  signal* 

(2.2)  While  there  may  be  substantial  divergence  in  opinion  regarding  the 
probabilitiesof  two  Bayesians  who  choose  their  models  from  P ,  a  pa- 
rameter characterizes  a  basis  for  consensus  beliefs:  conditional  on  the 
value  of  the  parameter,  they  agree  on  the  randomness  in  X. 

(2.3)  A  minimal  parametrization  of  a  class  P  minimizes  the  extent  of 
prior  introspection  necessary  to  identifya  Bayesian  probability  model 
P  &  P;  the  measure  P  need  only  be  specified  on  the  coarsest  subfield 
7e  which  still  provides  a  basis  for  consensus  opinion  or  full  knowledge 
of  the  remaining  randomness  in  X. 

(2.4)  There  is  no  redundeincy  in  the  model  specification  when  the  param- 
eter is  minimal.  The  conditional  distributions:  X  \  6'  and  X  |  6"  are 
distinct  almost  everywhere  (for  every  P  ^  P)  for  two  distinct  val- 
ues 6'  ^  6"  of  a  minimal  parameter.  Otherwise,  a  parameter  with  a 
coarser  a— field  could  be  defined  which  does  not  distinguish  6'  and  6". 

(2.5)  A  frequentist  class  P  like  that  in  Remark  (2.1)  would  be  inappropri- 
ate as  a  class  of  models  for  a  group  of  Bayesians  unless  each  Bayesian 


^Notable  exceptions  arise  in  the  case  when  X  is  an  infinite  sequence  of  random  variables. 
Such  examples  cire  treated  in  the  sequel. 


was  completely  certain  about  the  randomness  in  the  signal.  Bayesians 
would  typically  prefer  to  choose  probabilistic  mixtures  of  distributions 
in  such  a  class.  Expressing  beliefs  as  such  a  mixture  indicates  how 
the  Bayesian  framework  enriches  upon  the  frequentist  approach.  The 
Bayesian  analogue  to  a  frequentist  class  P  would  be  P*,  its  convex 
hull  under  probabilistic  mixtures.  Then,  every  prior  distribution  tt  on 
$  would  correspond  to  a  distribution  P  G  P*.  Depending  on  how  the 
parameter  <^  is  defined  however,  distinct  prior  distributions  on  $  do 
not  necessarily  lead  to  distinct  models  P  ^  P*.  However,  if  the  fre- 
quentist parameter  <^  is  also  a  Bayesian  parameter,  i.e.,  a  measurable 
function  of  the  data,  then  distributions  tt  on  $  will  uniquely  identify 
models  P  e  P*. 

An  example  of  such  a  case  is  when  X  is  a  sequence  of  three  exchange- 
able binary  variables.  Traditional  Bayesian  analyses  and  frequentist 
analyses  would  model  such  a  sequence  as  independent  and  identically 
distributed  Bernoulli  random  variables.  The  Bayesian  analysis  then 
requires  the  specification  of  the  distribution  of  the  success  proba- 
bility over  the  range  [0,1].  Our  analysis  would  define  parameters  as 
functions  of  the  observables.  The  count  of  successes  is  a  minimal  pa- 
rameter for  this  problem.  These  take  on  only  four  values  -  0,1,2,  or 
3  and  the  prior  distribution  is  specified  by  specifying  the  probability 
of  any  three  such  outcomes  for  the  sequence. 

(2.6)  While  there  is  a  prior  distribution  tt  on  0  for  every  distribution 
P  e  P ,  it  is  not  necessary  that  every  distribution  tt  on  0  corresponds 
to  a  distribution  P  G  -P.  Necessary  conditions  for  this  to  be  case  are: 
(i)  the  parameter  0  is  minimal;  and  (ii)  the  class  P  equals  P* ,  its 
convex  hull  under  probabilistic  mixtures. 

(2.7)  K  a  class  P  is  maximal  with  respect  to  the  (minimal)  parameter  0, 
the  all  possible  heterogeneous  beliefs  consistent  with  the  parametric 
model  defined  by  0  can  be  expressed  in  terms  of  a  member  distribution 
of  P. 

(2.8)  When  P  is  a  "maximal  class"  of  probability  models  for  a  given 
parametrization  the  Bayesian  parameter  is  equivalent  to  Dynkin's 
(1978)  "^-sufficient"  statistic  for  P .  He  shows  that  when  an  ^—sufficient 


statistic  exists  for  P,  a  convex  set  of  probability  measures,  then  P 
contains  a  subset  P^  of  extreme  points  and  any  measure  in  P  can  be 
characterized  by  a  probabilistic  mixture  ("barycentre")  of  extreme 
measures.  Dynkin's  set  P^  is  just  the  sub-class  of  models  correspond- 
ing to  prior  distributions  which  are  degenerate  at  specific  values  of 
the  parameter. 

3      Applications  with  Finite  Sequences 

Suppose  that  X  consists  of  a  finite  sequence  of  component  variables, 

■X"  =   [Xi,...,Xs\, 

having  outcome  space 

Z    ^    Xi  X  Xj  X  ...  X  Xj\ii 

the  Cartesian  product  of  the  outcome  spaces  of  the  coordinates.  Let,  as 
before,  J  denote  the  cr-field  for  which  the  Bayesians  can  specify  probabili- 
ties. 

Example  1:  Arbitrary  Sequences 

First,  let  P  represent  the  class  of  all  possible  distributions  on  the  mea- 
surable space  (X,  J). 

The  minimal  paxametrization  is  obtained  by  letting  the  parameter  field 
equal  the  basic  field  J .  The  obvious  parameter  is  then  the  identity  function, 
that  is, 

6{x)    =   X, 

in  which  case  0  is  a  copy  of  X  and  the  field  ^  is  equivalent  to  T. 

The  likelihood  function  of  any  A  E  T  reduces  to  the  indicator  of  A 
interpreted  as  an  element  of  ^,  or, 

y  0,    otherwise. 

The  prior  for  a  distribution  P  is  P  itself  on  (0,  ^),  and  the  parametric 
representation  becomes  the  integral  of  the  indicator  of  A  C  Q  with  respect 
to  P. 
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Since  there  is  no  consensus  of  opinion  within  the  class  of  all  distributions 
on  {X  ,7),  the  parameter  field  is  not  a  strict  subfield  of  the  basic  field  and 
specifying  a  prior  on  (0,^)  is  equivalent  to  specifying  a  distribution  on 

(r,j). 

Example  2:  Deterministic  Sequences 

Suppose  that  X  is  a  "deterministic"  sequence,  that  is,  suppose  that  the 
Bayesians  in  the  group  agree  on  a  transformation  T  :  X\  —*  X  such  that 
X  =  T(ii).  For  instance,  the  components  of  X  could  represent  the  states 
of  a  dynamic  system  progressing  from  time  1  to  iV  with  unknown  initial 
condition  X\. 

The  minimal  parameterization  is  obtained  by  letting  the  parameter  field 
equal  Ji,  the  field  induced  by  Xi.  The  projection  of  X  onto  Xi  then  serves 
as  a  minimal  parameter,  or, 

e{x)    =   xi, 

with  0  equal  to  a  copy  of  X\  and  Q  equivalent  to  the  field  on  X\. 

The  likelihood  function  is  then  the  indicator  of  the  initial  conditions 
leading  to  sequences  in  some  event  yl  G  /, 

[0,    otherwise. 

The  prior  distribution  is  the  distribution  of  the  initial  condition  Xi  and 
the  parametric  representation  gives  the  distribution  of  a  sequence  as  the 
probabilitistic  mixture  of  the  possible  sequences  over  the  initial  condition. 
This  example  can  be  extended  in  various  ways.  For  instance,  instead  of 
finding  agreement  on  a  single  T,  the  Bayesian  may  only  be  able  to  agree  on 
some  set  of  transformations  {T}  such  as  the  set  of  all  lineair  transformations. 
The  parametrization  will  then  have  to  be  extended  to  encompass  the  field 
induced  by  T. 

Example  S:  Exchangeable  0-1  Sequences 

Suppose  that  the  components  of  X  each  take  on  either  the  value  "0"  or 
"1,"  and  suppose  that  7  is  the  set  of  all  subsets  of  X .  Let  P  be  the  class 
of  exchangeable  distributions  on  (X,  J),  that  is,  distributions  P  that  are 
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invariant  under  permutations  of  the  components  of  X.  For  instance,  X  can 
be  thought  of  as  a  sequence  of  "random"  drawings  without  replacement 
from  an  urn  containing  N  balls  which  are  either  mzirked  "0"  or  marked 
"1." 

The  minimal  parameter  field  is  generated  by  the  sets  consisting  of  se- 
quences having  equal  number  of  "l"'s.  A  convenient  parameter  Ls  the  rel- 
ative frequency  of  "l"'s  in  the  signal,  that  is, 

1    ^ 
■*    »=i 

having  outcome  space  0  =  {0,  ^, . . . ,  1}  with  §  the  set  of  all  subsets  of  0. 
The  likelihood  function  is  well  known  in  connection  with  urn  prob- 
lems.    In  particular,  the  likelihood  for  any  subsequence  or  cylinder  set 
A    =    [ii, . ..,!„]  where  n  <  A''  is 


L{A,0)    = 


N  -n 


-,    for  Er=iXi  <0 


otherwise. 


In  other  words,  the  number  of  ways  of  arranging  the  remaining  balls  marked 
"1"  after  the  first  n  are  removed  divided  by  the  total  number  of  arrange- 
ment of  such  balls  before  the  first  n  were  removed.  The  prior  distribution 
represents  the  distribution  over  the  composition  of  the  um  and  the  para- 
metric representation  says  that  the  distribution  of  any  finite  sample  is  a 
mixture  of  the  distributions  conditional  on  the  urn's  composition. 

This  parametric  representation  is  also  known  as  de  Finetti's  represen- 
tation (for  finite  sequences);  see  de  Finetti  (1937). 

Example  4-'  Spherically-Symmetric  Sequences 

Suppose  now  that  the  components  of  X  take  on  values  in  the  real  num- 
bers and  suppose  that  J  is  the  Borel  field.  Let  P  be  the  class  of  spherically- 
symmetric  distributions  on  (X,  7),  that  is,  distributions  which  are  invariant 
under  rotations  of  X.  For  instance,  the  components  of  X  could  represent  a 
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set  of  orthogonal  coordinates  of  some  vector  quantity  such  as  the  velocity 
of  a  point  mass  in  space.  If  the  Bayesiaiis  agree  that  the  particular  choice  of 
coordinate  frame  is  irrelevant,  then  P  consists  of  the  spherically-symmetric 
distributions. 

The  minimal  pcirameter  field  is  generated  by  the  sets  of  sequences  having 
the  same  Euclidean  norm;  these  sets  cire  surfaces  of  A'^-dimensional  spheres 
centered  at  the  origin.  The  second  sample  moment  of  the  sequences  can 
serve  as  a  minimal  parameter,  that  is, 

The  parameter's  outcome  space  0  consists  then  of  non-negative  real  num- 
bers. 

The  likelihood  function  for  cylinder  sets  of  the  form  A  =  \dxi,  ...,d2:„] 
is  foimd  to  be 


L{A,e)  =  . 


r(l.) 


r(-^)(«-N9")T 


1  _  Z^.=i'^. 

-■■  ST  ^ 


JV-n 
2-1 


dxi...dxr„    when  Er=i  a;,-  <  AT^ 


0  otherwise. 


(This  will  be  proved  in  the  context  of  distributions  which  are  uniform  on 
general  /''-spheres  in  the  sequel.  For  the  present  /^  case,  see  also  Eaton, 1981.) 

The  prior  distribution  consists  of  the  distribution  of  the  second  sample 
moment  of  the  sequence  and  the  parametric  representation  gives  the  dis- 
tributions in  P  as  a  mixture  over  the  conditional  distributions  given  the 
second  sample  moment. 

The  analysis  remains  valid  for  sequences  whose  components  take  values 
in  some  subset  of  the  real  numbers,  although  the  domain  of  definition  of 
the  likelihood  and  its  norming  constant  may  have  to  be  changed.  When 
the  components  are  restricted  to  the  set  0,1,  the  example  reduces  to  the 
previous  example. 

4      Infinite  Sequences 

Most  applications  of  parametric  models  in  statistics  involve  infinite  se- 
quences of  random  variables,  that  is,  sequences  whose  length  N  is  not 
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bounded  beforehand.  Observations  in  the  infinitely  far  future  cannot  be 
called  observable  in  any  practical  sense.  However,  they  can  be  viewed  as 
limits  of  real  observations  and  therefore,  parametric  models  for  infinite  se- 
quences are,  at  best,  limits  of  operationally  defined  parametric  models. 

Parameters  for  infinite  sequences  are  defined  as  limits  of  parameters  for 
finite  sequences  when  the  length  is  increased  without  bound.  For  instance, 
for  the  exchangeable  0-1  sequences  in  Example  3,  we  can  define  the  param- 
eter to  yield  the  limiting  relative  frequency,  or  propensity,  of  a  sequence, 
that  is, 

,   ,    _    J    limjv_oo  ;^  I2,=i  2;,»    whenever  this  limit  exists, 
[  —1,  say,  otherwise. 

However,  it  is  well-known  (see  de  Finetti,  1937)  the  set  for  which  lim- 
iting frequency  does  not  converge  has  P-probability  zero.  This  implies,  for 
instance,  that  a  parameter  which  assigns  the  value  0  to  the  non-convergent 
sequences  induces  a  parameter  field  differing  from  the  former  only  on  sets  of 
/'-probability  zero.  Therefore,  we  need  only  consider  the  parameter  space 
0  consisting  of  the  interval  [0,  l]. 

For  these  values  of  0  one  finds  for  the  likelihood  function  for  sets  A  = 
[xi, . ..,!„]  that, 

L{A,0)  =  eEr=i'.  (i_^)"-Er=,'., 

by  taking  the  limit  of  the  expression  in  example  3  as  N  —^  oc.  The  prior  be- 
comes the  distribution  of  the  limiting  relative  frequency  and  the  parametric 
representation  becomes  de  Finetti's  representation. 

Parametric  models  for  infinite  sequences  can  be  used  in  practice  as  ap- 
proximations for  models  for  large  sequences.  For  the  exchangeable  case, 
an  advantage  of  using  this  approximation  is  the  fact  that  the  likelihood 
simplifies  to  a  product  measure.  A  disadvantage  is  that  one  now  has  to 
specify  a  prior  on  the  entire  [0, 1]  interval  instead  of  just  A^  points. 
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